Disclosure Control Methods and Information Loss for Microdata
نویسندگان
چکیده
Statistical disclosure control (SDC) seeks to modify statistical data so that they can be published without giving away confidential information that can be linked to specific respondents. The challenge for SDC is to achieve this modification with minimum loss of the detail and accuracy sought by database users. SDC methods for microdata are usually known as masking methods, of which there is a wide range. From the point of view of their operational principles, current masking methods fall into the following two categories (Willenborg and De Waal 2001): • Perturbative. The microdata set is distorted before publication. In this way, unique combinations of scores in the original dataset may disappear and new unique combinations may appear in the perturbed dataset; such confusion is beneficial for preserving statistical confidentiality. The perturbation method used should be such that statistics computed on the perturbed dataset do not differ significantly from the statistics that would be obtained on the original dataset. • Nonperturbative. Nonperturbative methods do not alter data; rather, they produce partial suppressions or reductions of detail on the original dataset. Global recoding, local suppression, and sampling are examples of nonperturbative masking.
منابع مشابه
A Quantitative Comparison of Disclosure Control Methods for Microdata
As described in Chapter 5, there is a plethora of statistical disclosure control (SDC) methods to protect microdata. This chapter provides guidance in choosing a particular SDC method by comparing some of the methods discussed in Chapter 5 on the basis of both information loss and disclosure risk. Information loss can be readily quantified using analytical measures (either generic or data-use-s...
متن کاملSource Data Perturbation in Statistical Disclosure Control
When tables of quantitative data are generated from a datafile, the release of those tables should not reveal information concerning individual respondents. This disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level, but this may create inconsistencies across tables. Alternatively, disclosure control methods can be ...
متن کاملAutomatic Generation of Masked Microdata
Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties working with these data from recognizing entities in the data and thereby disclosing information about these entities. In very broad terms, disclosure risk is the risk that a gi...
متن کاملPost-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets
Previous work by these authors has been directed to measuring the performance of microdata masking methods in terms of information loss and disclosure risk. Based on the proposed metrics, we show here how to improve the performance of any particular masking method. In particular, post-masking optimization is discussed for preserving as much as possible the moments of first and second order (and...
متن کاملAn approximate microaggregation approach for microdata protection
Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is k-anonymity, introduced by Samar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001